97
Table 8.1 Degree of difficulty of a P problem (sequence alignment) compared to an NP problem
(protein folding)
Algorithm
Runtime complexity (m, n = sequence length of a, b)
Heuristic algorithms:
Blast
O(n*m)
Dynamic algorithms:
Needleman-Wunsch
Cubic: O(n3); e.g. at 5 = 125
Smith-Watermann
Quadratic: O(n2); e.g. with 5 = 25
Protein folding:
For x possible folds
Exponential: xn (e.g. for 2 convolutions: 2n; for 7
convolutions: 7n)
8.1
8.3
Informatic Solutions for Computationally Intensive
Bioinformatics Problems
Many interesting problems in biology and bioinformatics have a built-in combinatorics
and thus a very large, difficult to understand solution space, which therefore has the diffi
culty NP (solution very difficult to find and computation time not foreseeable - if you show
me the solution, I can usually confirm it relatively quickly). All in all, however, computa
tional time problems are computer science problems, which can therefore also be tackled
directly with tools from computer science and computer technology.
Tip 1: Use Modern Computer
This is often effective in practice. First, if you have a difficult or computationally intensive
bioinformatics problem, you should not use a web server (otherwise you might wait until
you black out!). However, most bioinformaticians have already taken this into account
when designing their programs. Protein structure predictions, for example, are often not
done online on the web server, but one receives (after a few hours or even days) the result
by e-mail (for example, when using SWISS-MODEL for homology models or ab-initio
predictions by the QUARK software from the Zhang lab). For own calculations I should
first use a notebook or PC as up-to-date as possible. Workstations or small computer clus
ters have even more computing power at first. For larger calculations, local (university
mainframes) or central computer clusters (e.g. Leibniz Computing Centers in Munich,
etc.) are then available. Tier 1 or Tier 0 mainframes such as JUQUEEN in Jülich then
provide the greatest performance (6 million billion floating point operations per second)
with 5.9 petaflops per second (https://www.fz-juelich.de/ias/jsc/EN/Expertise/Supercom
puters/JUQUEEN/JUQUEEN_node.html).
8.3 Informatic Solutions for Computationally Intensive Bioinformatics Problems